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Abstract This paper considers a quadratically-constrained cardinaUty minimization 
problem with applications to digital filter design, subset selection for linear regres- 
sion, and portfolio selection. Two relaxations are investigated: the continuous relax- 
ation of a mixed integer formulation, and an optimized diagonal relaxation that ex- 
ploits a simple special case of the problem. For the continuous relaxation, an absolute 
upper bound on the optimal cost is derived, suggesting that the continuous relaxation 
tends to be a relatively poor approximation. In computational experiments, diagonal 
relaxations often provide stronger bounds than continuous relaxations and can greatly 
reduce the complexity of a branch-and-bound solution, even in instances that are not 
particularly close to diagonal. Similar gains are observed with respect to the mixed 
integer programming solver CPLEX. Motivated by these results, the approximation 
properties of the diagonal relaxation are analyzed. In particular, bounds on the ap- 
proximation ratio are established in terms of the eigenvalues of the matrix defining 
the quadratic constraint, and also in the diagonally dominant and nearly coordinate- 
aligned cases. 

Keywords Cardinality minimization • Mixed integer quadratic programming • 
Relaxation methods • Subset selection • Portfolio optimization 

Mathematics Subject Classification (2000) 90C11 90C57 90C59 
1 Introduction 

This paper considers the problem of minimizing the cardinality of a vector x G 
subject to a single convex quadratic constraint: 

min C(x) s.t. (x - c)^Q(x - c) < 7, (1.1) 
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where C(x) is the number of nonzero components of x, Q is a positive definite matrix, 
and 7 is a positive scalar Geometrically, problem ( 11.11 ) corresponds to finding a point 
of minimal cardinality in an ellipsoid, denoted as Sq, centered at the point c. The ori- 
entation and relative lengths of the ellipsoid axes are determined by the eigenvectors 
and eigenvalues of Q while 7 determines its absolute size. 

The author's interest in ( II. lb stems from the design of digital filters in signal pro- 
cessing (see 127112811 and the references therein). In this context, x represents a vector 
of filter coefficients and cardinality minimization is motivated by the fact that the 
cost of implementing a filter is often dominated by arithmetic operations, especially 
in hardware. The quadratic constraint represents a requirement on filter performance, 
for example a specified fidelity in approximating a desired frequency response or a 
bound on recovery error in the equalization of communication channels. 

Problem (II. lb also has applications to subset selection for linear regression ifTTl 
|2TI . more specifically the overdetermined case in which Q is positive definite and 
less so the underdetermined case in which Q is rank-deficient and control of the car- 
dinality is employed as a regularization. A similar problem arises in optimal linear- 
quadratic control with cardinality-constrained input 1141 (see also I20l for optimal 
control with sparse state-feedback gains). A problem related to ( II. lb has been studied 
extensively in cardinality-constrained financial portfolio optimization il2ll4ll5l [T0lfT2l 
[T6]|22]|25|- The portfolio optimization problem however has additional linear con- 
straints, most notably non-negativity, upper bounds on nonzero variables, and some- 
times lower bounds as well. There is some computational evidence |2 1 to suggest that 
the relative lack of constraints in ( II. lb increases the difficulty of the problem, at least 
when approached using conventional integer optimization methods. 

Certain cases of (II. lb are known to be efficiently solvable, the simplest of which is 
the case of diagonal Q. Extensions to block-diagonal, tridiagonal, and well-conditioned 
Q are discussed in |28 |. The authors of [11 1 present polynomial algorithms for sev- 
eral additional cases, including an FPTAS for the general banded case and exact al- 
gorithms for the cases of a tree-structured covariance graph, a large independent set 
("arrow"-structured Q), and exponential decay in the entries of Q away from the di- 
agonal. The case in which nearly all of the eigenvalues of Q are identical and larger 
than the rest is treated in 1 15 1. 

In the general case, ( II. lb is a difficult combinatorial optimization problem. Sev- 
eral heuristics such as forward and backward greedy selection can be used, often 
with good results (see e.g. [28], also [29] for references on portfolio selection heuris- 
tics). Although approximation guarantees do exist for forward selection in the near- 
diagonal case 1 1 1 1 and for backward selection when a (difficult to evaluate) threshold 
test is met |9 1, more general guarantees for heuristics are not available. Thus if a cer- 
tificate of optimality or a bound on the deviation from optimality is desired, branch- 
and-bound remains the method of choice and has therefore been considered by many 
researchers |l2]|4]|5]|22 25|. In particular, |4 | investigates a branch-and-cut algorithm 
employing disjunctive cuts and finds that such cuts are ineffective when Q is near full 
rank. In [2 1, Lemke's pivoting method is used to provide warm starts in solving con- 
tinuous relaxations. Lagrangian relaxations have also been considered [22|. In |25 |, a 
lifted polyhedral relaxation is applied to mixed-integer second-order cone programs 
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(which include the problems considered here) to take advantage of the more mature 
techniques for solving mixed-integer linear programs. 

The complexity of branch-and-bound can be significantly reduced if special- 
ized relaxations are available that can better approximate the original optimal cost 
while remaining efficiently solvable. Such relaxations permit increased pruning of 
the branch-and-bound tree and can also suggest stronger reformulations of the orig- 
inal problem. In the present context, a sequence of works ifTOl 11211131 [T6ll29l have 
developed the perspective relaxation, so-called because of its relationship to the per- 
spective of a convex function. The perspective relaxation can also be viewed as a 
particularly tractable instance of disjunctive convex optimization jT). In M12II16L the 
relaxation is derived for general convex functions (not necessarily quadratic) using a 
convex hull approach; 1 12 1 emphasizes the identification of linear cuts whereas |[l6ll 
proposes solving the nonlinear relaxation directly, aided by second-order cone repre- 
sentations. In contrast, |10| focuses on portfolio optimization and derives the relax- 
ation through Lagrangian decomposition. The authors of 1 10, 12 , 16] also show that 
the perspective relaxation is tighter than the standard continuous relaxation in certain 
contexts. To apply the relaxation to portfolio optimization problems, a diagonal ma- 
trix must be separated from Q; a semidefinite programming method for determining 
the best separation was reported very recently |29 | and is shown to outperform sim- 
pler methods in 1 12 , 13 1. None of the above works however have analyzed the quality 
of approximation of the relaxation with respect to the original problem. 

In this paper, we focus on the pure quadratically-constrained problem ( II. lb and 
investigate two relaxations. The first is the conventional continuous relaxation, ob- 
tained by formulating dl.lb as a mixed-integer optimization and relaxing binary-value 
constraints to unit interval constraints. An absolute upper bound is given on the op- 
timal cost of the continuous relaxation. The bound suggests that the continuous re- 
laxation is relatively weak for many instances of dl.lb . a hypothesis borne out by 
numerical experiments. The second relaxation exploits the simplicity of the case of 
diagonal Q, specifically by constructing the best diagonal approximation to dl.lb . re- 
ferred to as a diagonal relaxation. A computational comparison of the two relaxations 
shows that diagonal relaxations often yield significantly stronger bounds and can 
greatly decrease the complexity of a branch-and-bound solution to dl.lb . by orders of 
magnitude in difficult instances, and even when Q does not seem close to diagonal. 
Similar efficiency gains are seen relative to the mixed-integer programming solver 
CPLEX |18|. Motivated by these results, this paper undertakes a theoretical analy- 
sis of diagonal relaxations, providing approximation guarantees for certain classes of 
instances and general insight into when diagonal relaxations are expected to be suc- 
cessful. In particular, bounds on the approximation ratio are derived in terms of the 
eigenvalues of Q and in the cases of diagonally dominant Q and nearly coordinate- 
aligned Sq. We note that a relaxation similar to the diagonal relaxation was proposed 
independently in 1 14] with similarly positive computational experience. A principal 
objective of the current paper is to support such findings with more detailed analysis. 

We begin in Sect. |2| by deriving some preliminary facts pertaining to problem 
dl.lb . In Sect. [3j continuous relaxations of dl.lb are discussed and analyzed, while 
the same is done for diagonal relaxations in Sect.|4] In Sect.|5| the two relaxations are 
compared numerically in terms of their approximation ratios and effect on branch- 
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and-bound complexity. A comparison with CPLEX is also reported. The paper con- 
cludes in Sect.|6l 

1.1 Notation 

Vectors and matrices are denoted using lowercase and uppercase boldface letters with 
x„ representing the nth element of a vector x and 2mn the (m,n) element of a matrix 
Q. The letter e is reserved for a vector of unit entries. For sets of indices Y and Z, 
xy represents the |y [-dimensional subvector of x corresponding to Y and Qyz the 
\Y\ X |Z| submatrix of Q with rows indexed by Y and columns indexed by Z. The 
notation Q ^ (Q ;^ 0) indicates that Q is positive semidefinite (positive definite); 
Q )^ D is equivalent to Q — D ^ 0. The nth smallest eigenvalue of Q is written A„(Q) 
except as noted in Sect. 14.51 we also use Amin(Q) and Amax(Q) for the smallest and 
largest eigenvalues. 

2 Preliminaries 

In this section, some facts related to problem (ll.ll l are derived for later use. In Sect. 12. II 
a condition is given for the feasibility of solutions of specified cardinality. In Sect. 12.21 
it is shown that variables that are either constrained to a zero value or assumed to be 
nonzero can be eliminated to yield a lower-dimensional instance of (11.11 ). 

2.1 Feasibility of solutions of specified cardinality 

First we obtain a condition for the existence of feasible solutions to dl.lb with a 
specified number K of zero-valued components. Suppose that x„ is constrained to 
a zero value for n in a set Z of size K. With Y denoting the complement of Z, the 
constraint in (II. lb becomes 

(xy-Cy)^Qyy(xK-CF)-24Qzy(xy-Cy)+4QzzCz < 7- (2.1) 

Consider minimizing the left-hand side of (12. Il l with respect to xy , with solution xy — 
Cy — (Qyy) 'QyzCz. If ( 12.11 ) is not satisfied when the left-hand side is minimized, 
then it cannot be satisfied for any value of xy . Hence a feasible solution to dl.ll ) exists 
subject to = for n e Z if and only if 

4{Q/Qyy)cz<7, (2.2) 

where Q/Qyy=Qzz-Qzj'(QF}')"'QFZ= ((Q"')zz) ' is the Schur complement 
of Qyy . Condition i2.2i may be generalized to encompass all subsets of cardinality K 
using a similar argument: If i2.2\ is not satisfied when the left-hand side is minimized 
over all subsets Z of cardinality K, then there can be no solution to ( ll.ll l with K zero- 
valued components. This yields the condition 

EoiK) = min {4(Q/Qyy)cz} < 7 (2.3) 
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for the existence of a feasible solution of cardinality N — K. In general, computing 
Eo{K) in ( I2.3I 1 involves an intractable combinatorial optimization. However, when Q 
has special structure, Eq{K) becomes much easier to evaluate and it is in these cases 
that condition ( 12.31 1 will be used. 

In the special case of a single zero-value constraint, i.e., Z = {«}, condition ( 12.21 ) 
reduces to 




(2.4) 



If ( I2.4I 1 is not satisfied, then it is not feasible for x„ to take a value of zero. It fol- 
lows that an easily computed lower bound on the optimal cost in ( II. lb is obtained 
by counting the number of indices n for which ( 12.41 ) is not satisfied. Furthermore, in 
Sect. 12.21 it is shown that the variables x„ corresponding to violations of ( I2.4l i can be 
eliminated from the problem to reduce its dimension. 



2.2 Variable elimination 



We now consider restrictions of problem ( 11.11 ) in which certain variables are con- 
strained to zero while others are assumed to be nonzero. These two types of con- 
straints arise in branch-and-bound as (II. lb is divided recursively into subproblems. 
Variables that must be non-zero to maintain feasibility may also be identified through 
condition ( 12.41 ). 

Let Z denote as before the subset of variables constrained to zero, U the subset 
of variables assumed to be nonzero, and F the remainder We show that an arbitrary 
subproblem defined by subsets (Z, U,F) can be reduced to the following problem: 



mm 



s.t. (Xf - Ceff) Qeff (Xf - Ceff) < 7eff , 



|f/|+C(xf) 
with effective parameters given by 

Qeff = QFF-QFf/(Qt/£/) ^QuF, 
Ceff = Cf + (Qeff) - ^ (Qfz " Qf£/ (Qc/f/ ) " ' Qf/z) Cz , 

7eff = 7-cI(Q/Qff)cz. 



(2.5) 

(2.6a) 
(2.6b) 
(2.6c) 



(Z,0,F: 



Problem (12.5b is an instance of (II. lb with \F\ variables instead of A^. 

The reduction can be carried out in the two steps (0, 0, { 1 , . . . , A^}) 
UUF) — > {Z,U,F).ln the first step, the constraints x„ —0 for n e Z reduce C(x) to 
C{xy) and the quadratic constraint in (II. lb to (12.11 ). By completing the square, (12.1b 
can be rewritten as 



X£/ 




T 


■Q£/£/ 


Qj/f' 




Xf/ - c'u 


Xf 






(Ifu 


Qff. 




Xp-C'p 



< 7eff, 



(2.7) 



where the subset Y has been partitioned into U and F, = Cj/ + ((Qff) 'QfzCz);/, 

and c'p = Cf+ ((QFF)"'QFzCz)f ■ 

In the second step {Z,%,U UF) — !• {Z,U,F), the non-zero assumption on xjj 
allows C{xy) to be rewritten as \U\ + C{xf). Since xu no longer has any effect on 
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the objective function, its value can be freely chosen, and in the interest of mini- 
mizing C(xf ), xu should be chosen as a function of xp to maximize the margin in 
constraint ( I2.7l i. thereby making the set of feasible xp as large as possible. This is 
equivalent to minimizing the left-hand side of (12.7b with respect to xj/ while hold- 
ing xp constant. Similar to the minimization of ( 12.1b with respect to xy, we obtain 

~ ''U ~ (Qj/t/)^' Qf/f (xf — c^) as the minimizer of ( 12.7b . Substituting back into 
( 12.7b results in the constraint in ( 12.51 ). except with in place of Ceff. By expressing 
(Qry)^ in terms of the block decomposition of Qyy in ( 12.7b . it can be shown that 

is equal to Cgff as defined in ( I2.6bb . thus completing the reduction. 

In the sequel, we focus on the unrestricted root problem ( II. lb with the under- 
standing that the results apply to any subproblem by virtue of the reduction to (12.51 ). 
In addition, the following assumption will be made: 

Assumption 2.1 Condition ( 12.4b is satisfied for all « = 1 , . . . , A^. 

In other words, it is assumed that a feasible solution exists whenever a single variable 
is constrained to zero, since any variables for which this is not the case can be elim- 
inated as shown in this section. Thus the focus is solely on the "difficult" part of the 
problem, i.e., those variables whose status is ambiguous. 

3 Continuous relaxation 

In the remainder of the paper, we consider two relaxations of (II. lb for the purpose 
of obtaining lower bounds on its optimal cost in the context of branch-and-bound. In 
Sect. 13.11 ( II. lb is reformulated as a mixed-integer optimization problem, yielding a 
continuous relaxation. Best-case and worst-case instances are exhibited in Sect. l3.2l to 
show that continuous relaxations can provide arbitrarily tight or loose bounds on the 
optimal cost of (II. lb . An absolute upper bound on the optimal cost of the relaxation is 
then derived in Sect. 13.31 suggesting that continuous relaxations are unlikely to yield 
good approximations to ( II. lb in most instances. 

3.1 Derivation 

Problem ( II. lb is first reformulated as a mixed integer optimization problem by asso- 
ciating with each continuous variable x„ a binary-valued indicator variable /„ with the 
property that /„ = if x„ = and /„ — 1 otherwise. Problem dl.lb can be restated in 
terms of indicator variables as follows: 

N 

min Vi,, s.t. (x - c)^Q(x - c) < 7, |jc„|<B„/„, i„e{0,l} Vn. 

(3.1) 

The constraint |x„| < Z?„/„ is the usual forcing constraint linking ;„ with x„ in the 
desired manner, where the positive constants B„ are chosen large enough to keep the 
set of feasible x unchanged from that in dl.lb . It will be seen shortly that B„ should 
be set to the smallest possible value subject to this requirement, i.e., 

B„ = max{|x„| : (x-c)^Q(x-c) < 7} =max{B+,B^} , 
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where 

B± =max{±x„ : (x-c)^Q(x-c) < 7} = ^/(Q-i),,^ ±c„. (3.2) 

The closed-form expressions for and can be derived straightforwardly from 
the corresponding KKT conditions lfn i26l . 

A continuous relaxation of O.ll l results from relaxing the binary- value constraints 
on /„ to interval constraints < /„ < 1. By minimizing the objective with respect to 
i and substituting back into ( 13.1b . we obtain the following minimization with respect 
to x: 

min s.t. (x-c)^Q(x-c)<7. (3.3) 

The continuous relaxation ( 13.3b is a quadratically-constrained weighted 1-norm min- 
imization and is therefore a convex problem. The optimal cost in ( 13.3b is clearly a 
lower bound on the optimal cost in ( 13.1b since the feasible set has been enlarged; 
more precisely, since the latter must be an integer, the ceiling of the former is also a 
lower bound. It is also seen that the lower bound is maximized when the constants fi„ 
are as small as possible. 

A stronger lower bound on ( 13.1b can be obtained by first separating each variable 
Xn into its positive and negative parts x+ and as follows: 

x„ = x'^ - , x+, > 0. (3.4) 

By assigning to each pair x+, corresponding indicator variables /+, and con- 
stants B^, B^, a mixed integer optimization problem equivalent to ( 13.1b may be 
formulated, where the values of and B^ are given by ( 13.2b . The continuous re- 
laxation of this alternative mixed integer formulation corresponds to the following 
quadratically-constrained linear program: 

min y(^ + ^\ s.t. (x+-x--c)^Q(x+-x--c) <7, x±>0. 
x+.x \B„ B„ ) 

(3.5) 

Using ( 13.4b to replace the absolute value functions in (13.3b with linear functions as 
done in linear programming [3 1, it can be seen that (13.3b is a special case of ( 13.5b with 
and B^ replaced by B„. Since B„ = max{B+,B,7}, the optimal cost in ( 13.5b is at 
least as large as that in ( 13.3b . and therefore ( 13.51 ) is at least as strong a relaxation as 
(13.3b . The term continuous relaxation will refer henceforth to (13.5b with B^ given by 
(O. 

Fig- Em shows a graphical interpretation of the continuous relaxation ( 13.5b . The 
asymmetric diamond represents a level contour of the cost function, which can be 
regarded as a weighted 1-norm with different weights for positive and negative com- 
ponent values. As seen from (13.2b . the weights correspond to the maximum extent 
of the ellipsoid along the positive and negative coordinate directions and can be 
found graphically as indicated in Fig. 13. II The solution to the weighted 1-norm mini- 
mization can be visualized by inflating the diamond until it just touches the ellipsoid. 
Note that Assumption l2. 1 l implies that (?q must intersect all of the coordinate planes. 
In Sect. 13.21 we will draw upon the geometric intuition in Fig. l3.1l to construct best- 
case and worst-case instances for continuous relaxation. 
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Fig. 3.1 Interpretation of the continuous relaxation as a weighted 1-norm minimization and a graphical 
representation of its solution. 



3.2 Best-case and worst-case instances 

In this subsection, instances of problem (11.11 ) are exhibited to show that the continu- 
ous relaxation can be a tight approximation to ( 11.11 ) as well as an arbitrarily poor one. 
The quality of approximation is characterized by the approximation ratio, defined 
as the ratio of the optimal cost of the relaxation to the optimal cost of the original 
problem. 

In the instances to be constructed, we take c = e and 7=1, which can be regarded 
as a normalization. The matrix Q is restricted to be of the form 

Q = A2l-(A2-Ai)vv^, (3.6) 



where A2 > Ai and v is vector with unit 2-norm and components equal to ±1/ \/N. It 
follows from (13.6b that v is an eigenvector of Q with eigenvalue Ai and the remaining 
A^ — 1 eigenvectors are orthogonal to v with eigenvalue A2. Geometrically, the ellip- 
soid (oQ corresponding to ( 13.6b has a single long principal axis in the direction v and 
shorter and equal principal axes in the other directions. We note for later use that the 
inverse of Q and the Schur complement Q/Qyy can be computed explicitly as 

Q-' = fl+%^vv^ (3.7) 
A2 A1A2 

rk/r» 5 T ^^A2(A2-Ai) , 

where K — \Z\ and vz is the unit 2-norm vector obtained by rescaling the subvector 
vz- 

To construct best-case instances for which the continuous relaxation is a tight ap- 
proximation to ( II. lb . our aim is to make the optimal cost of the relaxation as large as 
possible. Based on Fig. 13. H and the above structure for Q, this can be done by choos- 
ing the major axis of Sq to be parallel to a level surface of the 1-norm and keeping the 
lengths of the minor axes to a minimum, thus allowing the ii ball to grow relatively 
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unimpeded. Algebraically, we set Ai = 1 /N, A2 = A^, of the components of v 

equal to + 1 / ^/N, and the remaining components of v equal to — 1 / v^- 

First it is shown that the point c — ^/Ny is optimal for ( 11.11 ) with a corresponding 
cost of \_N /2\. Feasibility follows from substitution into the constraint in dl.lb . To 
prove optimality, we verify that an additional zero-valued component is not feasible, 
i.e., condition is violated for^T = A^- [A^/2J + 1 = \N /2\ +1. Substituting dlSb 
and c = e into ( 12.31 ) and rearranging, we obtain 



(rA?/2] +1)A2 + (LA^/2J -l)Ai |z|=[;v/2l+i \N/2\+\ 

(3.9) 

The maximum in (13.9b is achieved by choosing Z to include all [A^/2] positive com- 
ponents of V and only one negative component, resulting in a maximum value of 
( [A^/2] - 1 ) 2 / ( [A?/2] + 1 ) 2 . The quantity £0 (^) can then be bounded from below by 
removing the fraction in front of the maximization. This yields 



(riv/2i+i)V 

which can be seen to be strictly greater than 7 = 1 as required. 

We now prove that the lower bound provided by the continuous relaxation is equal 
to the optimal cost of \N /2\ for the unrelaxed problem. Toward this end, we make 
use of the Lagrangian dual of the continuous relaxation, given by 

max c^ii - y/yn^Q^n s.t. -g"<At<g^, (3.10) 

where = I /B"^ for « = 1 , . . . , A^. A derivation of the dual problem can be found 
in ll26l . It is shown that the optimal cost of the dual is strictly bounded from below 
by [A^/2J — 1, implying through duality that the optimal cost of the primal ( 13.51 ) is 
between [A^/2J — 1 and [A^/2J and is equal to [A^/2J after rounding up to the next 
integer. From (O and (ES, we find that B+ = 1 + ^/TT{N~T)Jn^ = B+ for all 
n. Substituting the dual feasible solution /J = g+ = (1 /B^)e into the dual objective 
function and simplifying, we obtain 




even, 
A^odd, 



(3.11) 



as a lower bound on the dual optimal cost. Straightforward algebraic manipulations 
show that the quantities in (13.11b are strictly greater than [A^/2J — 1 in the two cases 
of A^ even and A^ odd. This completes the demonstration of the potential tightness of 
the continuous relaxation lower bound. 

Next we construct instances for which the lower bound resulting from the con- 
tinuous relaxation is as loose as possible. The worst-case scenario corresponds to the 
optimal cost in ( 11.11 ) being equal to A^ — 1 and the optimal cost of the relaxation being 
less than 1. The former cannot equal A^ given Assumption 12 . 1 1 while the latter cannot 
equal zero exactly since that would require x = to be a feasible solution, in which 
case the optimal cost in dl.ll ) is also zero. Referring again to Fig. 13. H and the form of 
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Q in ( 13.6b . the optimal cost of the continuous relaxation can be minimized by orient- 
ing the major axis of the ellipsoid S'q so that it points toward the origin and obstructs 
the growth of the £i ball. Algebraically, we set v = {l/VN)e, Xi ~ 1/{N — I), and 
X2 = {N — l)/2. We verify that the unrelaxed optimal cost is equal to — 1. From 
^Ji, we have (Q^^nn = (A^+ 1)/A^, which ensures that ( 12.4b is satisfied for all n. 
Using (1231), Eq{K) in (O for K = 2 evaluates to A^(A^- 1)/(A^(A^- 1) - 1). Since 
this quantity is greater than 7=1, condition ( 12.31 ) is violated for K^2 and the optimal 
cost in dl.lb must be equal to — 1 . 

To show that the optimal cost of the continuous relaxation is less than 1, we con- 
sider the feasible and strictly positive solution x+ = c — ( 1 / VXj") v = ( 1 ^ ■\/(A^ — 1)/A')e, 

= 0. From (13.5b . the corresponding cost is 



N-JN(N- 1) 

^ (3.12) 



where Z?+ = 1 + \/ {N + l)/N is the common value for the constants given by 
(13.2b . Since > 2 while the numerator in (13.12b can be seen to be less than 1, we 
conclude that the optimal cost in (13.5b is less than 1 as claimed. The approxima- 
tion ratio in these instances is thus equal to 1/(A^ — 1), which approaches zero as 
increases. 



3.3 An absolute upper bound 



The constructions in Sect. 13. 21 imply that the approximation ratio for the continuous 
relaxation can range anywhere between and 1, and thus it is not possible to place 
a non-trivial bound on the ratio that holds for all instances of dl.lb . It is possible 
however to obtain an absolute upper bound on the optimal cost of the continuous 
relaxation in terms of the problem dimension A^. 



Proposition 3.1 Under Assumption \2.1\ the optimal cost of the continuous relaxation 
(13.5b is bounded from above by 9N/2, where = 1 — y^T/c^'Qc. 

Proof Consider the solution b+ — = 0c, i.e., b^ = 0c„, ^7,7 = for c„ > and 
b^ = 0, ^7,7 = |c„| for c„ < 0. It can be verified that this is a feasible solution for 
the continuous relaxation (the solution lies on the boundary of the ellipsoid S'q), and 
hence the optimal cost of the relaxation is bounded from above by 

(^Ljf+(^L |^ = ^E , , ^ (3-13) 

using (13.2b . Assumption l2. 1 I then implies that each of the fractions on the right-hand 
side of (13.13b is no greater than 1/2, completing the proof. □ 



Proposition ^, ll indicates that the continuous relaxation cannot be a tight approx- 
imation if the optimal cost in dl.lb is greater than [0A^/2]. This suggests that it is 
unlikely for the continuous relaxation to yield a strong bound on dl.lb in most in- 
stances, since if it did, this would imply that the optimal cost in dl.lb is not much 
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greater than QN /2 in most cases, a fact considered unlikely. The situation is exacer- 
bated if the factor 9 is small. This negative result motivates the consideration of an 
alternative relaxation as described in Section|4] 

We note in closing that Lemarechal and Oustry 1 19] have shown that a common 
semidefinite relaxation technique is equivalent to continuous relaxation when applied 
to cardinality minimization problems such as (11.11 ). As a consequence, the properties 
of the continuous relaxation (13.51 ) noted in this section also apply to this type of 
semidefinite relaxation. 



4 Diagonal relaxation 

As an alternative to continuous relaxations, in this section we discuss relaxations of 
problem (II. Il l in which the matrix Q is replaced by a diagonal matrix, an approach 
referred to as diagonal relaxation. As will be seen in Sect. 14. II problem dl.lb is easily 
solved in the diagonal case, thus making it attractive as a relaxation of the prob- 
lem when Q is non-diagonal. It is shown in Sect. 14.21 that diagonal relaxations can 
yield exact as well as arbitrarily poor approximations to (II. lb . as was the case for 
the continuous relaxation in Sect. [3] However, numerical evidence in Sect. |5] and 
elsewhere [27 1 indicates that the lower bounds provided by diagonal relaxations are 
often significantly stronger than those from continuous relaxations. This computa- 
tional experience motivates a better theoretical understanding of situations to which 
diagonal relaxations are particularly well-suited. Within this context, approximation 
guarantees are derived in Sect. 14. 3144. 5l for the three specific cases of well-conditioned 
Q matrices, diagonally dominant Q, and nearly coordinate-aligned ellipsoids <§q. 

4.1 Derivation 

To obtain a diagonal relaxation of problem (II. lb . the matrix Q is replaced with a 
positive definite diagonal matrix D to yield a similar constraint: 



Geometrically, ( 14.1b specifies an ellipsoid, denoted as So, with axes that are aligned 
with the coordinate axes. Since the relaxation is intended to provide a lower bound 
for the original problem, we require that the coordinate-aligned ellipsoid enclose 
the original ellipsoid so that minimizing over yields a lower bound on the min- 
imum over (Sq. For simplicity, the two ellipsoids are taken to be concentric, in which 
case the nesting of the ellipsoids is equivalent to the condition D ^ Q. Sufficiency 
follows from the inequality 



N 



(X - C)^D(X - C) = ^ Dnn{Xn - Cnf < 7- 



(4.1) 



n=l 



(x-c)^D(x-c) < (x-c)^Q(x-c) Vx 



(4.2) 



so if X G Sq, then both sides of ( 14.2b are bounded by 7 and x G <ob- Conversely, if 
D ;^ Q, then there exists a vector x that violates ( 14.21 ). and by scaling x — c so that 
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the right-hand side of (I4.2l i is equal to 7, we have xE S'q but Su since x does not 
satisfy i4~\l . 

Problem ( 11.11 ) is greatly simplified in the diagonal case. Replacing Q by D, con- 
dition (12.3b simplifies to 

min y D„„cl < 7 

since D/Dyy ~ Dzz- The minimum is attained by choosing Z to correspond to the K 
smallest elements of the sequence D„„cj^,n = 1,... ,N. It follows that ( 14.1b admits a 
solution with K zero-valued components if and only if 

Sk{{D„„cI}) < 7, (4.3) 

where Sk denotes the sum of the K smallest elements of a sequence. The minimum 
cardinality corresponds to the largest value of K such that ( 14.3b holds. 




For every D satisfying ^ D ^ Q, minimizing C(x) subject to ( 14.11 ) results in 
a lower bound on the optimal cost in (II. lb . Thus the set of diagonal relaxations is 
parameterized by D as illustrated in Fig. 14. II We are naturally interested in obtaining 
a diagonal relaxation that is as tight as possible, i.e., a matrix such that the mini- 
mum cardinality associated with D^/ is maximal among all valid choices of D. Such 
a relaxation can be determined based on condition (14.3b . specifically by solving the 
following optimization problem: 

Ed{K)= max 5/f({D„„c^}) s.t. ^ D ^ Q, D diagonal, (4.4) 

for selected values of K. If E^iK) in ( 14.4b is less than or equal to 7, then ( 14.31 ) holds 
for every D satisfying the constraints in (14.4b . and consequently a feasible solution x 
with K zero- valued components exists for every such D. We conclude that the optimal 
cost of any diagonal relaxation is at most N — K. On the other hand, if Ed{K) > 7, 
then according to ( 14.3b there exists a D for which a vector x with K zero-valued com- 
ponents is not feasible, and for this D the optimal cost of the corresponding diagonal 
relaxation is at least — A' + 1 . By selecting values of K to perform a bisection search 
over 1, . . . and solving (14.4b each time, we eventually arrive at the highest possi- 
ble optimal cost under any diagonal relaxation, i.e., the tightest lower bound on ( 11.11 ) 
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achievable with a diagonal relaxation. Henceforth the term diagonal relaxation will 
be understood to refer to the tightest such relaxation. 

The above procedure determines both the tightest possible diagonal relaxation 
and its optimal cost at the same time, and amounts to solving ( 14.4b for a maximum of 
[log2 A^J + 1 values of K. Since the function Sk is concave in D |6 1 and the constraints 
in ( I4.4l l are convex, the maximization in (I4.4l i is a convex problem. Furthermore, 
(14.4b can be recast as a standard semidefinite program following [61 by expressing 
the function Sk as the optimal cost of a linear program and then substituting the 
Lagrangian dual of the linear program. Thus ( 14.4b can be solved efficiently using 
standard interior-point algorithms. Further efficiency enhancements can be made as 
detailed in |26, Sec. 3.5]. 

4.2 Worst-case instances 

As with the continuous relaxation in Sect. 13.21 we consider extreme instances in 
which the diagonal relaxation is either a tight approximation to the original prob- 
lem or an arbitrarily poor one. It is clear that if Q is already diagonal, the diagonal 
relaxation and the original problem coincide and the approximation ratio defined in 
Sect. I3.2l is equal to 1. It is shown that the approximation ratio can also equal zero, 
i.e., the optimal cost of the diagonal relaxation can be zero while the original problem 
has a non-zero optimal cost. Based on Fig. 14. II the diagonal relaxation is expected to 
result in a poor approximation when the original ellipsoid S'q is far from coordinate- 
aligned, thus forcing the coordinate-aligned enclosing ellipsoid to be much larger 
than Sq. This situation is exemplified by the first class of instances in Sect. 13.21 in 
which is dominated by a single long axis with equal components in all coordinate 
directions. To show that the diagonal relaxation has an optimal cost of zero in these 
instances, we make use of the following lemma. 

Lemma 4.1 Assume that the vector c has unit-magnitude components. Then the op- 
timal cost Eii{K) in (14.4b is bounded from below by /rAmin(Q). This lower bound is 
tight if the eigenvector v corresponding to Amin(Q) has components of equal magni- 
tude. 

Proof The diagonal matrix D = A]nin(Q)I satisfies D ^ Q and is therefore a feasible 
solution to ( 14.4b . Hence the corresponding objective value /rAn,in(Q) (with c,^ = 1 
for all «) is a lower bound on E4{K). If the eigenvector v has equal-magnitude com- 
ponents and is normalized to have unit 2-norm, then the inequality D ^ Q implies 
that 

1 ^ 

^^Dv=-^D„„<v^Qv = An,in(Q) (4.5) 

for any feasible D in ( 14.4b . The solution D = Aniin(Q)I satisfies ( 14.51 ) with equality 
and is therefore an optimal solution to ( 14.4b for K = N under the assumptions of the 
lemma, yielding Eii{N) = A^Amin(Q). Using the fact that the mean of the K smallest 
D„„ for K < N is no greater than the mean of all diagonal entries, it follows from 
dOl l that 

Sk{{D„„}) <KKm{Q), K=\,2,...,N-l, (4.6) 
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again for any feasible D in ( 14.41 ). Since the solution D = Aanin(Q)I also satisfies ( 14.61 ) 
with equality, it is an optimal solution to (I4.4l i for all K under the assumptions of the 
lemma and we have E^lK) — /rAmin(Q). □ 

In the first class of instances in Sect. 13.21 c e, Aniin(Q) = Ai = 1 /N and the cor- 
responding eigenvector v has equal-magnitude components. It follows from Lemma 
14. II that Ed{K) = KXi = K/N, which does not exceed 7 = 1 for any K. Hence the 
optimal cost of the diagonal relaxation is zero while the optimal cost in the unrelaxed 
problem dl.lb is [N/2\. This implies that it is not possible to bound the approxima- 
tion ratio away from zero for all instances of ( II. lb . as with the continuous relaxation. 
Furthermore, since the continuous relaxation yields a tight approximation for the 
same class of instances, neither relaxation strictly dominates the other (diagonal re- 
laxations are clearly dominant in the case of diagonal Q). These conclusions however 
are based on extreme instances. It will be seen in Sect.|5]that in more typical instances 
the diagonal relaxation can offer a significantly better quality of approximation than 
the continuous relaxation. In addition, non-trivial lower bounds on the diagonal re- 
laxation approximation ratio can be obtained as in Sect. I4.3M.5I when the class of 
instances of (II. lb is restricted. 

4.3 Eigenvalue-based approximation guarantees 

In this subsection, the quality of approximation of the diagonal relaxation is char- 
acterized in terms of the eigenvalues of the matrix Q. The resulting bounds on the 
approximation ratio are strongest in the case of well-conditioned Q, i.e., when the 
eigenvalues of Q have a low spread. Geometrically, the well-conditioned case cor- 
responds to a nearly spherical ellipsoid S'q, which can be enclosed by a coordinate- 
aligned ellipsoid Sj) of comparable size as illustrated in Fig. 14.21 Given the close 
approximation of S'q by Sd in terms of volume, one would expect a close approxi- 
mation in terms of the cardinality cost as well. This geometric intuition is confirmed 
by the analysis. 




Fig. 4.2 Diagonal relaxations for two ellipsoids «?q with contrasting condition numbers. 

The results presented in the remainder of the section are more conveniently stated 
in terms of the number of zero-valued components rather than the number of non-zero 
components. Define K* to be the maximum number of zero- valued components in 
(II. lb and to be the maximum number of zero-valued components in the diagonal 
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relaxation of ( 11.11 ). The enclosing condition ^Sq C Sj) ensures that K* < K^, and a 
good approximation corresponds to the ratio K^/K* being not much larger than 1. It 
is shown that K* and K^i can be bounded by the following quantities related to the 
eigenvalues of Q and its Schur complements: 

^ = max{/:: An,ax(Q/QFWFw)%({c'}) < y], (4.7a) 
K = max {K : A^in(Q)5^ ({c^}) < y] , (4.7b) 

where Y [K) denotes the index set corresponding to the N — K largest-magnitude com- 
ponents of c (its complement Z{K) corresponds to the K smallest components). The 
relationships among K*, Kj, K_ and K are specified below. 

Theorem 4.2 The maximum numbers of zero-valued components in problem ( II. Il l 
and its diagonal relaxation, K* and K^i respectively, satisfy the ordering K<K*< 
Kd < K, where K and K are defined in (14.71 1. Furthermore, the approximation ratio 
Kj/K* is bounded as follows: 

(K+\)Y(K+\) )/An,in(Q)l - 1 

\ — \ . (4.o) 

K* - K - K 



Proof The quantity K* is equivalently the largest value of K such that condition ( 12.31 ) 
is satisfied, and hence K* can be bounded from below through an upper bound on 
Eo{K) in (12.3b . By choosing a specific subset Z{K) corresponding to the K smallest- 
magnitude components of c, we obtain 

Eo{K) = min {c|(Q/Qff)Cz} < cl(^K)iQ/QY{K)Y{K))Cz{K) 

<Ka.{Q/QY(K)YiK))SK{{cl}), (4.9) 

where the second inequality is due to a property of quadratic forms |17|. It follows 
from i43\ and the definition of K in ( I4.7al ) that K* > K. Similarly, is the lai-gest 
value of K such that Ej{K) in (14.4b is no greater than 7 and can therefore be bounded 
from above through a lower bound on Ed{K). Since D — /lmin(Q)I is a feasible solu- 
tion to (14.4b . we have Eci{K) > ?^in{Q)S K{{cj,}) and < K from the definition of 
Zin d^Tbl i. 

To obtain the bound on the ratio K/K,we infer from the definition of K in ( I4.7al ) 
that Amax(Q/Qi'(^+i)y(^+i))%+i({c,^}) > 7- The left-hand side of this inequahty 
can be bounded from above as follows: 

^ax(Q/QF(^+l)r(^+l))%+l({c„'}) < WAn,,n(Q)^^^ < A^in(Q)5r,l({c2}), 

(4.10) 

where k {K+ l)KixxiQ/QY{K+i)Y{K+i))/Km{Q) > K+ 1. The last inequality in 
( 14.10b is due to the fact that the mean of the smallest elements in a sequence is 
non-decreasing when a larger number of elements is included. From the inequal- 
ity ^m{Q)S\k\ ({c^}) > 7 and the definition of K in ( I4.7bb . we conclude that K < 
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In the limit of large K, the bound on the approximation ratio / K* in Theorem 
I4.2l is approximately equal to the eigenvalue ratio Amax(Q/Qy(A:+i)F(A'+i))/'^min(Q), 
which can be regarded as a type of condition number This eigenvalue ratio is in turn 
bounded from above by the conventional condition number k{Q) = Amax (Q) /■^in(Q) 
[171, thus linking approximation quality in terms of cardinality to the geometric ap- 
proximation quality illustrated in Fig. 14.21 

Theorem l4.2l can be strengthened somewhat by exploiting an invariance property 
of problem ( 11.11 ) and its diagonal relaxation. It is straightforward to see that the op- 
timal cost in ( 11.11 ) (and hence K*) is invariant to diagonal scaling transformations of 
the feasible set, i.e., transformations parameterized by an invertible diagonal matrix S 
mapping c to Sc and Q to S^^QS^'. Likewise, the optimal cost Ect{K) in ( 14.41 ) can be 
shown to be invariant to the same transformations, and thus Kj is invariant |26|. By 
generalizing the definitions of K_ and K, Theorem l4.2l can be generalized as follows: 

Corollary 4.3 For any invertible diagonal matrix S, define (K) to be the index set 
corresponding to the N — K largest S„„c-^ and 

Es = m^x{K : ^,ax((S-^QS-')/(S-'QS-')y^(^)j,^(^))5^({5„„c2}) < ^| ^ 

Ks = max{K : ?^n{S'^QS-')SK{{S„„cl}) < 7} . 

Then Theorem \4.2\ holds with Q, Kj K, andY{K) replaced by S^'QS^^ K^, Kg, and 
Ys{K) respectively. 

The scaling matrix S can be chosen to minimize the eigenvalue ratio in Theorem l4.2l 
i.e., as a type of optimal diagonal preconditioner for Q, thus minimizing the bound 
on the approximation ratio. 

The bounds in Theorem 14.21 are essentially tight. Specifically, it is shown that 
for N > 5, the inequalities K<K* and K^t < K can be simultaneously tight so that 
the left-hand inequality in ( 14.8b is met with equality, while the right-hand inequality 
reduces to K/K < {K + l)/K and is asymptotically tight as °°. We consider 
again the first class of instances constructed in Sect. 13.21 in which c = e, 7 = 1, 
and the eigenvector v corresponding to the smallest eigenvalue of Q has [A^/2] 
components equal to +1/^/N and [N/2\ components equal to —1/^/N. We keep 
Ai = and change A2 to A2 = 1/(2 [A^/2] — [Vn\ — 1). Given these choices, 
(I4.7bb yields K = N = 1 /Ai, while from (13.8b we have Amax(Q/Qy(A:)y(A:)) — ^2 and 
hence ^ = I/A2 = 2[A^/2] - IVn\ - 1 from ( I4.7al ). It can then be verified through 
substitution that the rightmost quantity in ( 14.8b is equal to (^^+ 1)/:^^ for > 5 as 
claimed. Furthermore, the construction satisfies the assumptions of Lemma 14711 and 
thus E,i{K) = 7i:Amin(Q) = K/N, from which it follows that K^^N^ K. 

It remains to show that K — K* for this class of instances. This is equivalent to 
showing that condition (12.3b is violated for K ~ K+ I. Substituting ( 13.8b and the 
chosen parameter values into ( 12.3b and performing some simplifications, the required 
condition £'o(:^ + 1) > 7 is equivalent to 

iK+l){K-l) max (e^vz)^ < (£+1)(k--1)+A^, (4.11) 

\Z\=K+l 

where K ~ A2/A1 ~ K/K. As was the case in ( 13.91 ), the maximum in ( 14. 1 lb is achieved 
by including in Z all [A^/2] positive components of v, with the remaining components 
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being negative. Noting that ^+ 1 = 2\N /1\ - [^/N\ > \N/2] for > 5, the maxi- 
mum value can be seen to be [VN \^ / {K+ 1). Condition ( 14.111 ) then becomes 

(2\N/2'] - LV5VJ^) {K-l)+N- [Vn\{k-1)>0, 

which is true given that 1 < ff < 2 for > 5. 

Theorem l4.2l and Corollarv l4.3l characterize the approximation quality of the diag- 
onal relaxation in terms of extreme eigenvalues, specifically the smallest eigenvalue 
of Q and the largest eigenvalue of a Schur complement of Q. A second character- 
ization involving intermediate eigenvalues can be obtained under the stochastic as- 
sumption that the eigenvectors of Q are chosen as an orthonormal set uniformly at 
random from the unit sphere. This assumption allows the bound on Eo{K) in (14.9b to 
be improved, essentially replacing the largest eigenvalue of Q/Qy(k+i)y{k+i) with 
the mean eigenvalue of Q, A(Q) = jj T.n=i ■^;;(Q)- By retaining the other elements in 
the proof of Theorem l4.2l we obtain the following bound on the approximation ratio, 
which holds with high probabiUty as becomes large. 

Theorem 4.4 Let the matrix V of eigenvectors ofQ be drawn uniformly at random 
from the set of N x orthogonal matrices. Then the approximation ratio Kii/K* is 
bounded from above by 



r(^+l)(l+ £)A(Q)/An,in(Q)1 - 1 

with probability at least 



K 




" 8 £2X(Q)2+w(;i(Q))) ' ^ ^ (0,emax)\=^, 

'^^(Q) ^ (4.12) 

8 £;i(Q) + (A^,x(Q)-^(Q))y ' 



£ > £, 



max 5 



where var(A (Q)) = ^ L^=i (■^n (Q) ~ ^ (Q))^ variance of the eigenvalues o/Q, 

emax = ^iax(Q)M(Q)-l, 



and 




(Amax(Q)-A(Q))2>8var(A(Q)), 
(Amax(Q)-A(Q))2<8var(A(Q)), 



(4.13) 



van 



_(A(Q)) \ 
A(Q)2 )' 

Proof As noted above, it suffices to replace the bound in ( 14. 9t with 

EoiK) < {l+e)I{Q)SK{{cl}) (4.14) 

and show that ( I4.14l i holds with the probabilities indicated in the theorem statement. 
The remainder of the proof proceeds as in Theorem l4.2l First note that for e > e, 
( 14.141 1 is implied by (14.9b and is therefore true with probability 1. For e £ {0,e, 



max? 
max ) 7 
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we use an upper bound on Eq{K) to bound the probability that ( 14.14b is violated. 
Choosing the same subset Z{K) as in Theorem 14.21 and using the definition of the 
Schur complement, we have 



Eq{K) < C2(K)^z{K)Z{K)^Z{K) = C Ac, 



Cz(/f) 





(4.15) 



where A is the diagonal matrix of eigenvalues of Q. The assumption on V implies that 
c is distributed uniformly over the sphere of radius \JSk{{c}^) centered at the origin. 
Hence the quantity c^Ac can be equivalently expressed as 5'a-({c^})(z-^Az/z^z), 
where the components of z are independent standard normal random variables. 

We now bound the probability that c^Ac > (1 + e)A(Q)5A:({c^}), which in turn 
bounds the probability that (14.14b is not satisfied. The event in question can be rewrit- 
ten as 

S = E \K{Q) - (1 + e)X(Q)l zl ^ t ^nzl > 0. 



(1=1 



n=\ 



It can be seen that the expected value of S is equal to —eNX{Q), and hence we 
are bounding the probability that a linear combination of independent chi-squared 
random variables exceeds its mean by eNX{Q). A straightforward application of the 
Chernoff bound [SI yields 

logPr(5>0)< min -i f log(l - 25„f ), 

where 5max — ■^ax(Q) ^ (1 + £)'^(Q)- To derive a closed-form expression for the 
Chernoff exponent, the function —(1 /2)log(l — 25„t) is bounded from above by the 
quadratic function 28^t^ + 8„t over the interval [0, 1 / (45max)] (this upper bound can 
be verified by comparing derivatives over [0, 1 / (45max)])- It follows that 



logPr(5 > 0) < min 2 var(A(Q)) + e'^A(Q)^ U^-eA(Q>, (4.16) 

0<r<l/(45„,ax) V / 

using the definition of var(A(Q)). We consider the two cases in which the uncon- 
strained minimizer t* — (l/4)eA(Q)/(var(A(Q)) + e^A(Q)^) is either less than or 
greater than l/(45max)- These correspond to the first two cases in ( 14.121 ). In the first 
case, substituting f = f * into ( 14.16b yields the exponent in ( 14.12b directly, while in the 
second case, the exponent in ( 14.121) results from substituting f = 1/ (45max) in (14.161 ) 
and then using the assumed inequality f* > 1/ (45max)- Solving the boundary condi- 
tion f * = 1 / (45max) for e yields the expression in (14.13b for the interval J^. □ 



Theorem l4.4l can be significantly less conservative than Theorem l4.2l in particular 
when most of the eigenvalues are small and comparable so that the mean eigenvalue 
of Q is much closer to the minimum eigenvalue than to the maximum eigenvalue. 
This preference for eigenvalue distributions weighted toward small values is seen in 
the numerical results in Sect. |5] Furthermore, it agrees with the following geometric 
intuition: Assuming that the ellipsoid (Sq is not close to spherical (k'(Q) is large). 
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it is preferable for most of the ellipsoid axes to be comparatively long (correspond- 
ing to small eigenvalues) and of the same order Such an ellipsoid tends to require a 
smaller coordinate-aligned enclosing ellipsoid, and consequently the diagonal relax- 
ation tends to be a better approximation. For example, in three dimensions, a severely 
oblate spheroid can be enclosed on average in a smaller coordinate-aligned ellipsoid 
than an equally severely prolate spheroid. Note also that the exponents in (14.12b de- 
pend on the eigenvalue distribution and are larger (i.e., the decay is sharper) when the 
spread of the eigenvalues is small as measured by var(A(Q)) or /Iniax(Q) ~ A(Q). 



4.4 The diagonally dominant case 



We now consider the case in which the matrix Q is diagonally dominant, specifically 
in the sense that 

L\Qmn\ , /-/I n\ 

^ < 1, (4.17) 



nQn 



i.e., the absolute sum of the normalized off-diagonal entries in any row or column 
is small. It is expected in this case that the original problem ( II. lb can be well- 
approximated by its diagonal relaxation, and that the quality of approximation de- 
pends on the degree of diagonal dominance. Indeed, it can be shown that the maxi- 
mum numbers of zero-valued components in (II. lb and its diagonal relaxation, K* and 
Kd respectively, are bounded by the following quantities related to diagonal domi- 
nance: 



: max < 



K: 



max ^ 



\Qn 



"iQn 



SK[{Qnncl}) < 7 



Kdd = max < K : 



max ^ 



Qrr 



nQn 



SK{{Qnncl))<y\ 



(4.18a) 



(4.18b) 



where Ziij,{K) in ( I4.18ab denotes the index set corresponding to the K smallest Qnnci 
A bound on the approximation ratio K^i/K* follows. 



Theorem 4.5 Assume that the matrix Q is diagonally dominant in the sense of (14.17b . 
Then the maximum numbers of zero-valued components in problem ( II. lb and its diag- 
onal relaxation, K* and respectively, satisfy the ordering K^,^ ^ K* < Kj < K^^, 
where K^^ and K^^ are defined in (14.18b . The approximation ratio K^/K* is bounded 
as follows: 



K* 



< 



:^dd 



:^dd 



(4.19) 



where 



rdd ■ 



la, 



1 + max J2 

«,GZdj(^d + l)^g2dd(&d + l) V2mma, 



1 — max ^ 



I a 



n 
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The ratio r^d in Theorem l4.5l plavs the same role as the eigenvalue ratio in The- 
orem |42] As Q becomes more diagonally dominant, r^d approaches 1 from above. 
Unlike with Theorem 14.21 there is no benefit to allowing diagonal scaling transfor- 
mations because the measure of diagonal dominance used here remains unchanged 
when Q is replaced by S^'QS^^ 

To prove the inequality < K^d, we use the following lemma, which specifies 
the optimal cost of ( 14.41 ) under the additional constraint that D is a multiple of a fixed 
diagonal matrix. 

Lemma 4.6 For any positive definite diagonal matrix Dq, the optimal cost E(i{K) in 
( I4.4l l is bounded from below by Aniin(DQ '^^QDq '^^)5a'({(Do)„„c^}). 

Proof We restrict D in (14.4b to be a multiple of Dq, thus obtaining a lower bound on 
Ed{K). With D = aDo, (US reduces to 

max a%({(Do)„„c2}) s.t. ^ aDo ^ Q. 

a 

— 1/2 —1/2 

Since Dq is invertible, the constraint can be rewritten as ^ al ^ Dq QDq , from 

— 1/2 —1/2 

which it follows that a should be chosen as the smallest eigenvalue of Dq ' QDq ' . 



We now proceed with the proof of Theorem|4 

Proof (Theorem \4.5i To prove that K^i < Kdd, we let Do = Diag(Q) in Lemma |431 
where Diag(Q) denotes a diagonal matrix with the same diagonal entries as Q. Us- 
ing the Gershgorin circle theorem ifTTl to bound the smallest eigenvalue of Q = 
Diag(Q)"'''2QDiag(Q)"'/^, we then obtain 

E,{K) > f 1 - max ^ -^^] SK{{Qnncl}), 

\ '" n^m V'Jmni'Jnn j 

from which we infer that /Tj < Kdd based on ( I4.18bb . 

To prove that K* > K^^, the quantity Eo{K) in ( 12.3b is bounded from above as 
follows, starting with the specific choice of subset Z ~ Zdd{K): 

Eo{K) < cl^^(K)i^/QYM{K)Yii{K))^Zii{K) 



< 



^Zm [K) Qzdd {K)Zm (K) ^Zm (K) 

= (Diag(Q)^/ML(^)Qz,,Wz,,w(Diag(Q)'/2c)^_,^(^) 

< Kniix{Qzii{K)Zii(K))SK{{Qnncl}) 
( , , \ 



< 



max > 



"^^^Aiif^) neZii(K) vQmmQii 



SK{{Qnncl}). 



The second line follows from the definition of the Schur complement, the third from a 
rescaling, the fourth from eigenvalue properties and the definition of Zdd{K), and the 
last from the Gershgorin circle theorem. Comparing with ( 14.1 Sab , we conclude that 
K* > K^id- Th^ proof of the bound on Kdd/Kdd similar to that in Theorem l4.2l □ 
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As with Theorem l4.2l there exist instances for which the left-hand bound in (14.19b 
is tight and the right-hand bound is asymptotically tight. We consider the same class 
of instances as in Sect. l3.2l with c = e, 7 = 1, and v having components equal 

to +\/\/N and \_N /2\ components equal to ~\/\/N. From ( 13.61 ) we obtain g„„ ~ 
(N - 1 )l2/N + Ai /N for all n and | Q„,„ | = (A2 - Ai ) /N for all m ^ n, from which it 
follows that 



foranyZ of cardinality/:. Choosing Ai = 1/A?and A2 = 1/A^+1/((A^- l)(2A^-3)), 
some straightforward calculations yield = — 1 and K^^ = N from (14.18b . and 
fdd = 1 + 2/{2N—3) for the ratio defined in Theorem l4.5l It can then be seen that the 
right-hand inequahtv in (|4TT9] | reads N/(N - I) < {N + 1)/{N - I) for > 3, which 
is asymptotically tight as N °°. 

To show that the left-hand inequality in (14.19b is tight, we note that the construc- 
tion satisfies the assumptions of Lemma l4n so we again have Ej{K) = A'Aniin(Q) = 
K/N and Kci=N ~ A'^d- The remaining required equality K* = K^^ = A^— 1 is equiv- 
alent to the all-zero solution being infeasible for ( II. lb . i.e., c^Qc > 7= 1. Using ( 13.61 ) 
and substituting the selected parameter values, we find c^Qc = 1 + N / {{N— 1){2N — 
3)) > 1 for even and c^Qc = 1 + (A^+ 1)/(A^(2A^- 3)) > 1 for A^ odd, completing 
the demonstration. 



4.5 The nearly coordinate-aUgned case 

A geometric analogue to diagonal dominance is the case in which the axes of the 
ellipsoid S'q are nearly aligned with the coordinate axes. Algebraically, this corre- 
sponds to the eigenvectors of Q being close to the standard basis vectors. We assume 
that Q is diagonalized as Q = VAV^, where the eigenvalues A„(Q) and the eigen- 
vector matrix V are ordered in such a way that A = V — I is small, specifically in the 
sense that its spectral radius p{A) satisfies K{Q)p{A} < 1. It is expected in this case 
that the diagonal relaxation would give a better approximation for smaller A, i.e., for 
closer alignments. Following the approach in Sect. 14.3144.41 it is shown that K* and 
Kj may be bounded by 



K,, = max {K:il + rc(Q)(p(A) + p{A)^))Sk{{W)cI}) < 7} , (4.20a) 



The approximation ratio Kj /K* may be bounded accordingly. 

Theorem 4.7 Assume that the matrix Q can be diagonalized as Q= {1 + A )A (I + 
A)^, where A is such that fc(Q)p (4 ) < 1. Then the maximum numbers of zero-valued 




Kan — max 



{K:{l-m)p{A))SK{{ln{Q)cl})<y}. 



(4.20b) 
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components in ( II. Il l and its diagonal relaxation, K* and respectively, satisfy the 
ordering K^^ ^ K* < < Kaa, where K^^ and K^ia are defined in (I4.20l l. The ap- 
proximation ratio Kd/K* is bounded as follows: 



< 



r(gna+l)^na1-l ^ 



(4.21) 



where 



l-K{Q)p{A) 



Theorem l4.7l characterizes the quality of approximation in terms of the ratio r^a- 
As A approaches 0, r^a approaches 1 as expected. Similar to Theorem 14.21 Theo- 
rem l4.7l mav be strengthened using diagonal scaling transformations since both p {A ) 
and the condition number K'(Q) may decrease as Q is transformed into S^'QS^' for 
different choices of S. The dependence on the condition number can be explained 
geometrically as illustrated in Fig. 14.31 On the left, the original ellipsoid S'q is both 
nearly coordinate-aligned and nearly spherical (i.e., K'(Q) is close to 1), and can there- 
fore be enclosed by a coordinate-aligned ellipsoid that is only slightly larger. Indeed 
in the limit k{Q) = 1, <oq is spherical and thus already coordinate-aligned, and the 
eigenvector matrix V can be chosen equal to I resulting in 4 = 0. On the other hand, 
if K'(Q) is large, even a small misalignment between the ellipsoid and coordinate axes 
results in a much larger enclosing ellipsoid, as seen on the right in Fig. 14.31 





Fig. 4.3 Ttie effect of the condition number K'(Q) on the approximation quality in the nearly coordinate- 
aligned case. For the same angular offset 6 between the axes of the original ellipsoid and the coordinate 
axes, the coordinate-aligned enclosing ellipsoid on the right is comparatively larger. 



In the proof of Theorem 14.71 below, we make reference to the scaled matrix 
A^'/^QA"'/^. When A is small, A^^^^QA"'/^ is close to the identity matrix and 
the deviation of its eigenvalues from 1 is specified by the following lemma. 
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Lemma 4.8 Assume that the matrix Q can be diagonalized as Q = (1 + 4 )A (I + 
A)^, where A is such that K{Q)p{A) < 1. Then 

ln,in(A-'/2QA-'/2)>l-,c(Q)p(4), 

A,nax(A-'/2QA-'/2) < 1 + ^(Q) (p (4) + p (Af). 

Proof Expanding A " ' /^QA " ' yields I + A+ A^+AA^, where A^ A'^l^ AA^I^ . 
The eigenvalues of A ^ ' ^^QA ^ ' can then be bounded by 



)^^{A-'I^^A-^I^)>\^K<^{A^ A ), 

Amax(A-'/2QA-^/2) < 1 +^^^(4+A^)+An,ax(4A' 



(4.22a) 
(4.22b) 



noting that A A is positive semidefinite in ( I4.22al ). The rightmost term in ( I4.22bl l can 
be bounded using the sub-multiplicative property of the spectral norm ITtII : 



Amax(44 ) = 



< 



1/2 



,-1/2 



= A,„ax(Q)p(4)Xi!,(Q) = m)p{^f- 



To bound the eigenvalues of A + A , we make use of a diagonalization of A . 
Given that V is orthogonal, it has unit-modulus eigenvalues and can be diagonalized 
by a unitary matrix U. From the relations A = V — I and p (A ) < 1 / K'(Q), we see 
that A can be diagonalized as A — Uf U^, where the eigenvalues of A lie on the 
highUghted arc in Fig.|43] It follows that A = UflJ-i with U = A^^/^U. 




We now invoke a theorem from ifTTl . which states that for any eigenvalue of A 



A , there exists an eigenvalue of A such that 



XiA- 



A )- 



-a(a; 



< 


U-'A^U 
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Expanding the right-hand side of this inequahty and using the sub-muhiplicative 
property of spectral norms, we obtain 



X{A+A )-X{A) 



< 



-1 



lull 



rc(Q)p(A) 



(4.23) 



The bound in ( 14.231 1 constrains the eigenvalues of A + A to lie within a Euclidean 

distance of K{Q)p{A ) from the arc in Fig. 14.41 Furthermore, the symmetry ofA+A 

implies that its eigenvalues are real-valued. It is clear then that Amax {A + A ) < 
fc(Q)p(4). From Fig. 14.41 and the assumption that K'(Q)p(4) < 1, it can also be 

seen that Ajmn {A + A ) is minimized by setting A ( A ) = in ( 14.231 ) since all other 

choices for X{A) would yield more positive values for Xmm{A +4 ). Substituting 

the resulting bound Amin(4 +4 ) > — fc(Q)p (4) into ( I4.22al ) completes the proof. 

□ 

Theorem l4.7l can now be proved straightforwardly using previous results. 



Proof (Theorem \4.7\ As in the proof of Theorem 14.51 we use Lemma 1461 to show 
that Kj < Knii, this time choosing Dq = A. Combining Lemma |431 with Lemma l4~8l 
then yields E,i{K) > (1 - k-(Q)p(A))5a:({A„(Q)c2}), which implies that < 
in Hght of ( I4.20bl i. 

To prove that K* > K,^^, we proceed as in the proof of Theorem |43] by fixing a 
specific subset Zna{K) corresponding to the K smallest A„(Q)c^. This yields 



£oW<c£„(^)Qz, 
= fA'/2 



{K)Z„,{K)C-Z„.,{K) 
T 







A '/2qA 



1/2 A 1/2 







< K..iA-'/^QA-'/^)SK{{W)4}) 

<(l + rc(Q)(p(A) + p(A)2))5^({A„(Q)c2}). 

In the second line above, the quadratic form has been rewritten in terms of the full 
matrix Q and then rescaled. The last two lines result from the definition of Zna{K) and 
Lemma |4~8] Combining the last inequality with (I4.20al i yields K* > K^^ as desired. 
The proof of the bound on A'na/:K^na is similar to that in Theorem l4.2l □ 



5 Numerical evaluation 

In this section, numerical results are presented to illustrate the performance of the 
two relaxations discussed in Sect. [3] and Sect. |4] In Sect. 15.11 the relaxations are 
compared on the basis of their approximation ratios under different conditions. In 
Sect. 15.21 the relaxations are incorporated in a branch-and-bound algorithm to gauge 
their effectiveness in reducing the complexity of solving problem (II. lb . 
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5.1 Approximation ratios 

Randomly generated instances of problem jl.ll l are used in this section to evaluate 
the approximation quality of the two relaxations. While it was seen in Sect. 14. 21 that 
neither relaxation dominates the other over all possible instances, the present com- 
parison using random instances indicates that diagonal relaxations yield significantly 
stronger bounds in many situations, including but not limited to those analyzed in 
Sect. 143^431 

In these experiments, the problem dimension is varied between 10 and 150 and 
the parameter 7 is normalized to 1 throughout. The continuous relaxation of each 
instance, and more specifically the dual problem ( 13.101 ), is solved using the MAT- 
LAB function f mincon. A customized solver described in |26 Sect. 3.5] is used for 
the diagonal relaxation; a general-purpose semidefinite optimization solver such as 
SDPT3 1 24 1 or SeDuMi |23| could also be used. In addition, a feasible solution is 
obtained for each instance using the backward greedy selection method in |28|. The 
cost of this feasible solution is used as a substitute for the true optimal cost, which 
is difficult to compute given the large number of instances. Numerical experience 
in ||28l however suggests that backward greedy selection is often optimal. The ap- 
proximation quality of each relaxation is measured by the ratio of the optimal cost 
of the relaxation to the cost of the feasible solution. These ratios are denoted Rc and 
Rii for continuous and diagonal relaxations respectively; they are lower bounds on 
the true approximation ratios. Note that we are returning to the original definition 
of approximation ratio in terms of the number of non-zero components and not the 
number of zero-valued components as in Sect. I4.3H4.5I 

In the first three experiments, the eigenvector matrix V of Q is chosen uniformly 
from the set of N xN orthogonal matrices (as assumed in Theorem l4.4l ). The eigen- 
values are drawn from different power-law distributions and then rescaled to match 
a specified condition number k{Q) chosen from the values \/N, N, ION, and lOOA^. 
Once Q is fixed, each component of the ellipsoid center c is drawn uniformly from 
the interval [— •\/(Q^')„„, ^ (Q^ ')««], in keeping with Assumption l2.1l 

Fig. I5.1f a) plots the approximation ratios and R^i as functions of and fc(Q) 
for an eigenvalue distribution proportional to 1/A, which corresponds to a uniform 
distribution for log A. Each point represents the average of 1000 instances. A 1/A 
eigenvalue distribution is unbiased in the sense that it is invariant under matrix in- 
version (up to a possible overall scaling), an operation that maps the positive defi- 
nite cone to itself. The continuous relaxation approximation ratio Rc does not vary 
much with or K'(Q). In contrast, the diagonal relaxation approximation ratio Rii 
is markedly higher for lower K"(Q), in agreement with Theorem 14. 21 and the geo- 
metric intuition in Fig. 14.21 Moreover, R^i improves with increasing so that even 
for k{Q) = lOON the diagonal relaxation outperforms the continuous relaxation for 
A^ > 20, with the difference being substantial at large A^. 

Figs. b.lf b) and l5.U c) show average approximation ratios for a uniform eigen- 
value distribution and a 1/A^ distribution, the latter corresponding to a uniform dis- 
tribution for the eigenvalues of Q^^ Compared to a 1/A distribution, a 1/A^ distri- 
bution is more heavily weighted toward small values whereas a uniform distribution 
is less so. Accordingly, each Rj curve in Fig. 15. li b) is lower than its counterpart in 
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Fig. 5.1 Average approximation ratios and Rj for (a) a 1 /A eigenvalue distribution, (b) a uniform eigen- 
value distribution, (c) a l/A^ eigenvalue distribution, and (d) unit diagonal entries and off-diagonal entries 
drawn uniformly from [-a,a]/\/N. In (a)-(c), K'(Q) = \/N,N, ION, lOON from top to bottom within each 
set of curves. In (d), a = 0.1,0.2,0.5,0.8 from top to bottom within each set of curves. 



15. H a) while the opposite is true in Fig. l5.U c). in agreement with the dependence on 
the eigenvalue distribution in Theorem l4.4l The effect of the condition number on Rj 
is also more pronounced under a uniform eigenvalue distribution and less so under a 
1/A^ distribution. The behavior of Rc on the other hand is largely unchanged from 
Fig.EHa). 

In a fourth experiment, the diagonal entries of Q are normalized to 1 while 
the off-diagonal entries are drawn uniformly from the interval [—a,a]/\/N, where 
a = 0.1,0.2,0.5,0.8. With high probability, such matrices are diagonally dominant 
in the sense of (14.17b for a = 0.1,0.2, and are not positive definite for a > 0.85. 
The vector c is generated as before based on the diagonal entries of Q-'. The aver- 
age approximation ratios are shown in Fig. l5.1f d). Similar to the condition number in 
Figs. l5.ir a)-(c). the parameter a does not appear to have much effect on R^. For the di- 
agonal relaxation, while Theorem l4.5l predicts a close approximation for a = 0. 1 , 0.2, 
the performance is stiU relatively good for a — 0.8. 
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The results in Fig. 15. II demonstrate that better bounds are achieved in many in- 
stances with diagonal relaxations than with continuous relaxations. Furthermore, this 
can be true even when the condition number K'(Q) or the off-diagonal amplitude a is 
high, whereas the analysis in Sect. I4.3144.5l is more conservative. 



5.2 Branch-and-bound complexity 

Next we consider the effect of the two relaxations on the complexity of a branch-and- 
bound solution to (II. 11 1. For this purpose, the relaxations are incorporated into a basic 
MATLAB implementation of branch-and-bound, referred to as BB. This algorithm 
is also compared to the mixed-integer programming solver CPLEX 12.4 |18 | as a 
point of reference. The comparisons show that diagonal relaxations can significantly 
increase the efficiency of branch-and-bound. It is also seen that a more specialized 
solver can outperform a sophisticated general-purpose solver in solving ( II. 11 1. 

Algorithm BB is based on the mixed integer formulation ( 13.11 ) and is summarized 
below. Full details can be found in |27|. The branching rule is to select the variable 
for which the margin in condition ( 12.41 ) is minimal. This rule is similar to the maxi- 
mum absolute value rule in EH] in that the /„ = subproblem is more likely to be 
severely constrained. The next node is chosen according to the "best node" rule, i.e., 
a node with a minimal lower bound. Feasible solutions are generated by running the 
backward selection heuristic at every node. To obtain lower bounds, condition ( 12.41 ) 
is checked at every node and bounds are updated as appropriate. Variable elimination 
as described in Sect. l2.2l is employed to reduce subproblem dimensions. For stronger 
lower bounds, either continuous or diagonal relaxations are solved, corresponding to 
two algorithm variants BB-C and BB-D. Relaxations are solved only after constrain- 
ing a variable to zero (/„ = branch) and when the subproblem dimension is at least 
20. In other cases, the increased computation does not seem to be justified by the 
improvement in bounds. 

For CPLEX, the split-variable mixed integer formulation corresponding to ( 13.41 ) 
and ( 13.5b is passed to the CPLEX MEX executable through the provided MATLAB 
interface. Because of the relative inefficiency of CPLEX as seen below, BB is run 
first and the optimal solution is used to initialize CPLEX. Given this initialization, 
CPLEX is instructed to emphasize optimality rather than feasibility, while all other 
options are set to their default values. Preliminary experimentation with changing 
solver parameters did not yield any gains. The experiments are run on a 2.4 GHz 
quad-core Linux computer with 8 GB of memory. BB is generally not observed to 
use more than one core at a time; CPLEX however is able to continuously exploit all 
four cores. 

Problem instances are generated randomly from the same four classes and in the 
same manner as in Sect. 15.11 thus satisfying Assumption |2T| in particular. Table 15.1] 
shows the solution times and numbers of nodes for the first three classes in which 
the eigenvalues of Q are drawn from different distributions. Each entry represents 
the average over 100 instances. For certain instance classes and solvers, the high 
computational complexity does not permit an accurate evaluation. In these cases, the 
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solution time is estimated by extrapolating from lower values of A^; such estimates 
are marked by parentheses. 



Table 5.1 Average computational complexity for different eigenvalue distributions. Times in parentheses 
represent extrapolated values. 



eig. dist. 


m) 


N 


time [s] 


number of nodes 








BB-C 


BB-D 


CPLEX 


BB-C 


BB-D 


CPLEX 


1/A 


N 


40 


1.24 


0.70 


18.38 


810 


599 


5979 






70 


662 


75 


2146 


2.60 X 10" 


0.69 X 10" 


2.72 X 10^ 






100 


(4x 10') 


1.09 X 10" 


(2x 10^) 




7.40 X 10" 






WON 


40 


0.84 


0.67 


(2 X 10') 


628 


611 








70 


334 


213 




1.85 X 10" 


1.46 X 10" 








100 


(1 X 10^) 


(5 X 10") 










uniform 


N 


40 


1.09 


0.72 


15.28 


689 


616 


5500 






70 


261 


98 


1159 


1.77 X 10" 


1.01 X 10" 


2.03 X 10^ 






100 


(7 X 10") 


1.43 X 10" 


(7 X 10") 




1.36 X 10' 






WON 


40 


0.18 


0.19 


(3 X 10") 


189 


189 








70 


3.64 


2.95 




1.69 X 10^ 


1.69 X 10^^ 








100 


98.6 


77.1 




9.71 X 10' 


9.95 X 10" 




1/A^ 


N 


40 


1.93 


0.51 


23.65 


1111 


438 


6929 






70 


2949 


12 


3139 


4.72 X 10" 


0.19 X 10" 


3.44 X 10^ 






100 


(6 X 10**) 


633 


(4x 10^) 




1.40 X 10" 






WON 


40 


1.12 


0.40 


(3 X 10') 


742 


328 








70 


1756 


14 




4.19 X 10" 


0.23 X 10" 








100 


(1 X 10*) 


848 






1.60 X 10" 





Considering first the comparison between BB-C and BB-D, it is clear from Table 
15.11 that diagonal relaxations can significantly decrease complexity. The gains gen- 
erally increase with the dimension and can reach several orders of magnitude for 
the eigenvalue distribution, which as seen in Sect. 15. H is most favorable toward 
diagonal relaxations. Even for a uniform distribution and K'(Q) = lOOA^, BB-D is 
slightly more efficient than BB-C, in apparent contradiction with the comparison in 
Fig. 15. U b). This can be explained by noting that Fig. 15. U b) represents the average 
approximation ratios for the root node whereas subproblems may have more non- 
uniform eigenvalue distributions and lower condition numbers. It is also interesting 
that instances in this class appear to be the easiest to solve. 

The comparison with CPLEX in Table |5T| shows the value of a more specialized 
algorithm for solving ( II. Il l, as has been observed by others |2, 14|. This is in spite of 
the fact that CPLEX is run as a compiled executable with full multicore capabilities. 
Indeed, the advantage extends to the BB-C variant at low A^, although the margin 
decreases at higher A^. Note also that CPLEX has difficulty with the more poorly- 
conditioned instances. Given CPLEX's use of techniques beyond pure branch-and- 
bound, it is difficult to identify precisely the reasons for its relative inefficiency. One 
factor is the poor performance of the heuristic used by CPLEX relative to the back- 
ward selection heuristic in BB. For this reason, CPLEX is initialized with the BB 
solution in the experiments. As for lower bounds, it is likely that checking condition 
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( I2.4l i confers significant benefits because of the ability to eliminate many infeasi- 
ble subproblems and improve bounds incrementally with minimal computation, and 
also because of the subsequent reduction in dimension. Another difference is the fre- 
quency at which relaxations are solved since in BB, some effort is made to avoid 
solving unprofitable relaxations. 

Table 15.2] shows a complexity comparison for Q matrices with unit diagonal en- 
tries and uniformly distributed off-diagonal entries, corresponding to Fig. 15. H d) in 
Sect. 15.11 The difference between BB-C and BB-D in this case is as dramatic as it 
is for the eigenvalue distribution in Table ISTI The performance of CPLEX is 
similar to its performance in Table ISTI for k{Q) = N. It is clear that BB-D remains 
the best option. 



Table 5.2 Average computational complexity for different off-diagonal amplitudes a. Times in parentheses 
represent extrapolated values. 



a 


N 


time [s] 


number of nodes 






BB-C 


BB-D 


CPLEX 


BB-C 


BB-D 


CPLEX 


0.2 


40 


1.66 


0.13 


26.87 


1128 


93 


8698 




70 


2941 


1.1 


4107 


6.76 X 10'' 


151 


4.85 X 10' 




100 


(7x 10'^) 


2.6 


(9x 10') 




187 




0.8 


40 


1.56 


0.76 


24.04 


849 


543 


7896 




70 


577 


50 


2853 


3.21 X 10'' 


0.57 X 10* 


3.86 X 10' 




100 


(4x 10^) 


4.86 X 10^ 


(4x lO') 




7.51 X 10** 





6 Conclusion and future work 

Two relaxations of a quadratically-constrained cardinality minimization problem ( II. lb 
were investigated, the first being the continuous relaxation of a mixed integer formu- 
lation, the second an optimized diagonal relaxation based on a simple special case of 
the problem. An absolute upper bound on the optimal cost of the continuous relax- 
ation suggests that it yields relatively weak approximations. In computational exper- 
iments, diagonal relaxations were seen to result in stronger bounds and significantly 
reduced complexity in solving ( II. lb via branch-and-bound. Substantial gains were 
also observed relative to the general-purpose solver CPLEX. To support these numer- 
ical results, this paper analyzed the approximation properties of diagonal relaxations, 
providing general insight and establishing guarantees in terms of the eigenvalues of 
the matrix Q and in the diagonally dominant and nearly coordinate-aligned cases. 

Given the interest in generalizations of ( 11.11 ) in portfolio optimization, it is hoped 
that the analysis in this paper could be extended to these more general formulations 
and to other relaxations such as the perspective relaxation lfT2l[T6ll29l . In addition, 
the positive experience with diagonal relaxations motivates further exploration of 
relaxations based on other efficiently solvable special cases, for example those in 
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